PreLab 07

Concordances
Due in class on Wednesday, October 25

In Lab 6 you will create a concordance. What is a concordance? It is an index to the words of a text or of a body of texts. For example, if you are writing an essay about Shakespeare's view of kingship, you might want to look at the instances in his plays where the word "king" is used. There are a lot of these instances. You can find them all by looking at a concordance to Shakespeare -- look up the word "king" and you will get references by Play, Act, Scene and Line Number, to every use of this word in every one of Shakespeare's plays.. The Oberlin College library has concordances to Shakespeare and Donne and Chaucer and Dante and Vergil and Plato and even to Joyce's Finnegan's Wake. It has several concordances to the Bible, and the Qur'an and the Guanzi. In fact, the library has more than 150 books whose title starts "A concordance to ..."

One of the issues that the creator of a concordance faces is how to refer to a specific use of a word. We are going to take the easy way out and just use line numbers. This is great for making a concordance to a single poem, and less practical for a novel. Here is one small portion of the output of our concordance for The Love Song of J. Alfred Prufrock by T.S. Eliot:

etherized 3
evening 2 17 77
evenings 50
eyes 55 56

So the word "etherized" appears on line 3, "evening" appears 3 times, on lines 2, 17 and 77, and so forth. In this lab you will write a program that asks the user for the name of a text file, and then prints a concordance of the text in that file.

We will use a few conventions:

  1. We will start numbering the lines of the file at 1.
  2. We will only number the non-blank lines.
  3. We will remove all punctuation from the front and back of words but leave the internal punctuation, which mostly consists of hyphens (left-handed) and apostrophes (we're).
  4. We will translate all words to lowercase.
  5. We will print the words of the concordance in alphabetical order
  6. At the end of the output we will print the number of lines and the number of different words in the file.

For example, consider a file with the following text::

one!!

Two Two
!!!! --
four four four four

five five Five! 'five five

Here is the output we want from this file:

five 5 5 5 5 5
four 4 4 4 4
one 1
two 2 2
I found 5 lines containing 4 unique words.

The word "one" appears once on the first line of the file; "two" appears twice on the line numbered 2 (we ignored the blank line between 1 and 2). There is a line 3, but the "words" on it consist only of punctuation characters so they are never added to the concordance.

Now you do it. The only thing for you to hand in for this prelab is a concordance (you don't need to alphabetize the words) for the start of Dr. Seuss's One Fish Two Fish. Here is the text:

One fish two fish
Red fish blue fish.
Black fish blue fish
Old fish new fish.

This one has a little star.
This one has a little car.
Say! what a lot of fish there are.

 

Honor Code

If you followed the Honor Code in this assignment, write the following sentence attesting to the fact at the top of your homework.

I affirm that I have adhered to the Honor Code in this assignment.